Data Summaries vs. Actionable Insights: Where You Can Trust AI
Part 3 of the AI vs. Human: A User Research Showdown series
Summary
There are numerous new platforms claiming to accelerate or even replace your typical user research with the use of AI. We put these claims to the test in a head-to-head study comparing the effectiveness of AI tools and human UX researchers throughout the generative research process. While AI tools sped up note-taking, data entry, and participant summaries, we ultimately found that a human researcher was absolutely necessary for thorough data analysis and meaningful insights.
Study Recap
In the rapidly evolving world of user experience research, the emergence of novel artificial intelligence tools presents a new frontier. As we stand on the cusp of potentially transformative changes, the question arises: How might AI impact traditionally human-led endeavors like generative UX research?
At Brilliant Experience, we performed an empirical study to directly compare the effectiveness of AI models and human researchers in conducting qualitative interviews. Our study centered on parents of young children planning international travel, examined under four distinct conditions, each varying in AI and human involvement. Across these scenarios, the researchers (or AI tools) produced standard deliverables: a slide deck report of key insights, a set of personas, and corresponding experience maps for each persona. We evaluated each approach not only for output quality but also for efficiency and depth of understanding.
Read more about our method and goals in our Executive Summary.
This Edition: Insight Generation
In this post, we’ll delve into our findings from the synthesis phase, starting with our experience generating key insights from the data in each condition.
Large Language Models (LLMs), such as the one powering ChatGPT, function as advanced qualitative data analysts. They are designed to consume and synthesize large volumes of text, making them seemingly ideal for analyzing UX Research data.
To test this potential, we evaluated AI tools across several synthesis tasks, including data entry, analysis, and ultimately insight generation. In the sections below, we’ll explore how these tools stacked up against trained UX researchers.
Choosing an AI Tool to Identify Key Insights
There are two main types of tools you can use for data synthesis: general AI tools like ChatGPT and specialized research synthesis tools designed specifically for analyzing qualitative research data.
Using General AI Tools for Insight Generation
General chatbot-style AI tools like ChatGPT can be leveraged for synthesis by uploading your transcripts and/or session notes and asking them to outline the key insights from the study. If you are using one of these tools to assist with any part of synthesis, it is critical that you use one with a “closed” AI model that doesn’t use your research data for further training of the tool. This could put your participants’ privacy in jeopardy.
Using Research Synthesis Tools for Insight Generation
Research synthesis tools like Next, Marvin, and Research Studio are specifically designed to analyze qualitative UX research data. Major research repository tools like Dovetail and Condens are now also incorporating AI features promising to speed your analysis.
There are three main considerations for choosing one of these tools:
Allowed File Types: Some tools allow both videos and transcripts, while others accept only one or the other.
Synthesis Functionality: These tools vary widely in their capabilities, so it is worth extensive testing prior to committing to one. Common AI features include transcription, tagging, affinitizing, and summarizing.
Cost: These tools can be expensive, especially those that function as a research repository. It is often required to sign a contract for an enterprise license to have access to the full feature set.
AI vs. Human Comparison: Insight Generation
Imagine a world in which you could have meaningful insights to act on immediately after data collection. This is usually one of the most time-consuming processes, as it involves meticulously entering, reviewing, tagging, and affinitizing your data and then figuring out what it all means. It is also one of the most highly skilled parts of the research processes. Researchers spend years honing their craft, and the quality of the insights is largely dependent on the experience and skill of the researcher. But could that all change, given how well-suited to qualitative analysis AI is?
The short answer is “no.” While we found AI tools to be incredibly useful for assisting in some aspects of data collection, they ultimately couldn’t handle insight generation on their own. We’ll dig into why in the following sections.
AI Win: High-Level Summaries
This is where AI truly excels. AI platforms, both general tools like ChatGPT and specialized research tools, were highly effective at processing the transcripts or notes from the study and summarizing the most common themes. Regardless of the tool used, these summaries were generated almost instantaneously—something that just wasn’t possible before the advent of AI.
Note that we found the summarization to be much more effective with (human researchers’) notes than transcripts, likely due to the more structured nature of notes.
These summaries provided an excellent high-level understanding of the data before diving into more detailed analysis, which was still necessary (as you’ll see below).
AI Win: Research Interview Summaries
In addition to summarizing entire datasets from our study, we discovered a valuable use case for AI in summarizing individual interviews. By inputting individual transcripts into AI tools, we obtained general summaries of each interview as well as detailed summaries on specific topics. These AI-generated summaries significantly expedited our data entry process, as we used them to populate our datasheet.
However, a thorough human quality control (QC) process was essential, as the AI occasionally missed important details and sometimes produced inaccurate information.
Human Win: Data Tagging
One of the more labor-intensive tasks in research synthesis is the process of tagging data to extract key findings and themes. This often requires meticulously reviewing transcripts line by line to highlight pertinent information or sifting through data spreadsheets to categorize notes. Ideally, AI technology could automate this process, but current capabilities fall short of this need.
Although numerous AI tools claim to assist with tagging, their functionality is highly limited and results are often inaccurate. Some platforms only tag positive or negative statements, which was too limited for our purposes. Others promise to tag by topic (either suggested themes or ones that you provide), but our extensive testing of these tools revealed disappointing accuracy rates and significant room for improvement in the technology.
Human Win: Actionable Insights
To determine whether AI can truly deliver meaningful insights, we compared the human-generated (Human Only and AI-Assisted condition) and AI-generated (AI Only and AI Moderated conditions) insights produced in our study. Ultimately, we found that AI tools couldn’t compete with human researchers when it comes to data synthesis.
Looking at the numbers first, our comparison revealed a considerable overlap between the insights produced by both AI and humans. However, humans identified more unique key insights. This suggests that while AI can replicate some aspects of human analytical capabilities, it lacks the ability to uncover deeper, novel insights.
But as we all know, quality is more important than quantity when it comes to actionable research insights. For this reason, we had four external product development experts assess the quality of insights from each condition in a blind test.
Reviewers consistently rated the AI Only condition as the least effective, criticizing the insights for being superficial and not actionable—more summaries than true insights (we like Nikki Anderson’s definition that insights “reveal to us the underlying motivations behind behavior and help us understand what happened, why it happened, and what the potential consequence is of not addressing the insight”). We agreed with reviewers that they often lacked clarity, depth, novelty, and business application.
The AI Assisted condition, where a human researcher utilized AI tools, outperformed the other conditions overall. This suggests a promising role for AI in augmenting human analytical tasks. The AI Moderated condition also performed well, further underscoring the promise of AI tools (but want to caution here that the insights in this condition were generated by a unique, full-featured, and pricey tool, which may not be representative of AI’s synthesis capabilities overall).
Human Honorable Mention: Pulling Quotes
One additional task that we felt was extremely important to discuss is the process of identifying representative quotes from interviews to support findings and insights. This process is essential but can be quite laborious at times. We were thus excited at the prospect of providing an AI tool with our study transcripts and asking it to quickly pull representative quotes, but were surprised at how ineffective AI was at this task. Contrary to our expectations, tools like ChatGPT frequently generated fictitious quotes and consistently failed to provide accurate excerpts from our transcripts.
When to Involve AI in Insight Generation
Given the findings of our study, we are starting to thoughtfully incorporate AI into our synthesis process in the following ways:
Speeding up our data entry by summarizing individual interviews
Providing high-level summaries at the end of each week of data collection to give us a bird’s eye view of the results
Brainstorming business opportunities related to our human-generated insights
As AI tools continue to progress, we look forward to leveraging them for data tagging and pulling key quotes.
Conclusion
AI tools seem like a perfect fit for the data synthesis phase, as they can consume and analyze large amounts of text. Unsurprisingly, there are many, many tools promising that they can help you analyze your qualitative research data. In our head-to-head competition of these tools and human researchers and found that when it comes to qualitative data synthesis:
AI excels at summarizing large datasets quickly, providing a useful snapshot of main themes and trends and helping researchers speed data entry.
AI tools, at the time of this writing, are not able to generate novel, meaningful insights at the level of human researchers.
For these reasons, we consider AI a “useful assistant” when it comes to insight generation. We are more likely to leverage AI in the study planning and data collection phases - where we graded it a “trusted partner” and “skilled researcher”, respectively.
Interested to learn how AI tools stack up against human researchers when creating personas and experience maps? Subscribe to our newsletter to make sure you don’t miss an issue of our AI vs. Human series – or our ongoing AI 4 UX video interview series featuring founders of some of the most popular AI tools for UX research.